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ABSTRACT 


A time-series is a set of events, sequentially calculated over time. Predicting 
the Time Series is mostly about predicting the future. The ability of a time 
series forecasting model is determined by its success in predicting the future. 
This is often at the expense of being able to explain why a specific prediction 
was made. The Box-Jenkins model implies that the time series is stationary, 
and thus suggests differentiating non-stationary series once or several times 
to obtain stationary effects. This generates a model for ARIMA, the "I" being 
the word for "Integrated". The LSTM networks, comparable to computer 
memory, enforce a gated cell for storing information. Such as the previously 
mentioned networks, the LSTM cells also recognize when to make preceding 
time-steps reads and writes information. Even though the work is new, it is 
obvious that LSTM architectures provide tremendous prospects as contenders 
for modeling and forecasting time series. The outcomes of the overall 
discrepancy in error indicate that in regards of both RMSE and MAE, the 
LSTM-model tended to have greater predictive accuracy than the ARIMA- 
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I. INTRODUCTION 

A time-series is a set of events, sequentially calculated over 
time. Such findings either are continuous over time, or 
recorded at discrete intervals of equal time. JExploration of 
the time series is a rather useful field of data analysis, 
retrieving information from past findings to establish the 
progression of a current phenomenon and to promote its 
prediction into the long term. The study also illustrates 
definite associations that might seem obvious to the naked 
eye and describes certain characteristics required to 
recognize this phenomenon’s current state of play. 
Continuous time series, such as brain activity assessment, 
are generally evaluated by measuring the series at intervals 
of equal time to offer a discrete time series. Observations in 
the series frequently correspond in various ways at different 
time-stages. The potential consequences can be estimated 
using observed data while succeeding observations are 
dependent. Analysis of the time series is the field of study 
where these associations and correlations are analyzed. The 
correlation might be the order of the data gathered, the 
linearity of the model, reiterating trends etc.2] The research 
community had also raised a handful of attempts to enhance 
time series exploration and analysisl3] 
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A. Prediction in time series: 

Predicting the Time Series is mostly about predicting the 
future. Although many data from the time series relies on its 
previous values. Near history attributes are strong measures 
of conduct in a factor. Lagged values of a variable as ina rate 
of exchange are regressed over one or more lagged values of 
themselves to determine the present and potential values of 
the variable. Forecasting in the time series is reliant on 
numerous models and methods, and is commonly included 
to predict different aspects of human behavior [*6]. While 
forecasting time series, one primarily uses sequence 
preceding values to estimate an upcoming value. It's 
convenient to map the association of the y variable with the 
preceding y vector values because of the usage of past 
values. The intention of time series analysis is to research 
the time series path patterns and to construct a model to 
explain the data structure and then forecasting potential 
values of time series.!71 


B. Forecasting in time series: 

Time series estimation includes utilizing time series 
observations, in accordance with the time series analysis, to 
build a predictive model illustrating the dependencies. This 
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model is then used in series estimation of future values. The 
forecast uses only past knowledge to estimate future 
values.|4] The ability of a time series forecasting model is 
determined by its success in predicting the future. This is 
often at the expense of being able to explain why a specific 
prediction was made. 


II. Forecasting methods in time series analysis. 
Methods of forecasting appear to be divided into two main 
categories. We are: intuitive and formalized. 


Forecasting 










Time series 
Models 


Domain 
Models 


Models ; Models 





Fig.1. Classification of forecasting methods 


Intuitive forecasting Methods comprise forecasts and expert 
inferences. They are also used nowadays in market research, 
economics, politics and other fields, the action of which is 
very complicated or harder to predict with mathematical 
models! 


Formalized methods are the methods, which use 
mathematical models to predict future values. They are 
divided into domain models and time-series models. 


Domain models-Models, focused on domain processes, laws, 
and mechanisms. Climate prediction model for instance 
includes an equation of fluid dynamics and thermodynamics. 
The most common solutions to software failures include the 
software reliability models. Their key disadvantage: they 
don't suit all different technology classes, since they focus on 
unique attributes of itl?]. A thorough comprehension of the 
processes, methodologies and technologies of software 
design and testing is needed to construct an appropriate 
model of software reliability and to be qualified to draw 
conclusions dependent on such a model. 


Time-series models are mathematical forecasting models 
that aim to determine, throughout the framework itself, the 
reliance of the future value from the past value as well as 
compute the estimation relying on this dependence. Such 
models are identical for various contexts, i.e. their overall 
appearance does not alter according to the design of the time 
series!°l. Time series models!"4] can be further divided into: 
> Regression models; 

> Smoothing models; 

> Models based on neura. 


Ill. Model description: 
Basics of Box and Jenkins Time Series Models 


/ (identific stationarity in the 


/ (Diagnos odel(s) using 
a ie oy 

















The Box-Jenkins Time Series models are titled after the 
statisticians George Box and Gwilym Jenkins (Box and 
Jenkins, 1970), these models generate projected estimates 
depending on the statistical criteria of the time series data 
obtained and these models have achieved considerable 
interest in the fields of organizational analysis, management 
science and _ statistics. The Box-Jenkins ARMA 
(Autoregressive Moving Average) model is a combination of 
the AR (Auto Regressive) and MA (Moving Average) models. 
The Box-Jenkins model implies that the time series is 
stationary, and thus suggests differentiating non-stationary 
series once or several times to obtain stationary effects. This 
generates a model for ARIMA, the "I" being the word for 
"Integrated". They are also known as the Autoregressive 
Integrated Moving Average (ARIMA) models (Box and 
Jenkins, 1976). 


There are many ways to forecast using the time-series 
models, and the most commonly used of which is as follows 
> Autoregressive Models (AR). 

> Moving Average Models (MA). 

> Mixed Models (ARMA). 

> Integrated Mixed Models (ARIMA). 


Autoregressive Integrated Moving Averages Models: 

In 1976, Box and Jenkins first published ARIMA in a book 
which gained considerable coverage from the science 
establishment, based on statistical analysis at that period. 
Thus, this approach is implemented in a vast range of sectors 
and stays one of the best reliable data processing and 
operational prediction models!3]. ARIMA stands for 
Autoregressive ( AR) Integrated (I) Moving Average (MA), 
otherwise regarded as the Box- Jenkins approach.!3 As 
indicated by the terminology, ARIMA (p, d, q) incorporates 
main components of the model: 


AR: Auto regression. A regression model using the 
relationships between an observation and a number of 
observed lags (p). 


I: Integrated. To render the time series stationary by 
calculating the observed variations at various times (d). 


MA: Moving Average. A methodology that bears into 
consideration the dependence among the observations and 
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the terms of residual error while using a moving average 
model for lagging observations (q). 


The Box-Jenkins method [4] summarizes the ARIMA process 
in three main steps: 


Identification: the initial move is to split through the time 
series by the three processes: AR (autoregressive), | 
(integrated) and MA (moving average); this phase 
apparently allows the parameters p, d and q to be defined, 
whilst testing the series’ stationarity first. Configuration of 
parameters p, q is accomplished owing to the functions of 
autocorrelation and partial autocorrelation that we would 
address in depth in the section of realization. The parameter 
d is differentiation order. 


Estimation: The following phase in the Box-Jenkins method 
is to determine the parameters of the respective models by 
supplying the p d and q orders. Estimates are made using 
non-linear methodologies. 


Diagnosis: the final stage in the Box-Jenkins process relates 
to checking the model’s validity. That is, validate that the 
predicted model is suited to the accessible data. We refer to 
statistical testing for that. 


LSTM 





Hidden 
layer 


Input 
layer 


Output 
layer 


Neural Networks 

ANN: ANNs are partially influenced by our brain's neural 
interactions, attempting to imitate the neuronal circuits and 
its behavioral patterns. ANNs, being resilient and self- 
adaptive, typically offer satisfactory remedies to implicitly 
inferred non - linear challenges, and solve challenges such as 
speech recognition, natural language processing, and 
forecasting*®“. The framework includes so-called layers 
across input and output, in which raising the amount of 
layers often raises network complexities. By loading 
identified data into the network, the network is educated by 
weight allocation in reference to the preferred input output. 
The weights are network-adjusted scalars to which the error 
within the output, which is inferred by the network, and the 
true expected output utilizing the weight gradient of the 
error. Nevertheless, those kinds of networks are not ideal for 
sequential results. Since the network seems to have no 
memory of earlier stages, it cannot model dependencies, 
leaving it difficult to interpret sequential data. Therefore a 
memory of some sort is desired. 


RNN: RNNs, recurrent neural networks, string numerous 
layers of networks together, whereby, in connection to the 
output, information from previous time stages is passed to 
future time stages. As the inputs and parameters are 
analyzed from each layer, the output from previous layers is 
taken into account, providing the network a type of memory 


[15]. The gradient can become fairly complicated when a time- 
dimension needs to be drawn into account for RNNs, 
culminating in the knowledge from previous phases in time 
either begin to disappear or become greatly enhanced. Such 
phenomenon are also known as a vanishing or bursting 
gradient. 


LSTM: The LSTM networks, comparable to computer 
memory, enforce a gated cell for storing information. Such as 
the previously mentioned networks, the LSTM cells also 
recognize when to make preceding time-steps reads and 
writes information!!5 171. Therefore the LSTM model 
addresses the issue of a disappearing or exploding gradient 
and helps the network to accurately recognize data far back 
in the sequence [8 16], 


LSTM model architecture is quite complicated to present. 
The Long Short-Term Memory (LSTM) network is basically 
the most commonly implemented framework to resolve the 
gradient disappearance crisis. SeppHochreiter and 
JrgenSchmidhuber suggested this same network structure in 
1997119]. With the emergence of deep architectures, LSTM 
has been extensively seen in sequence learning currently, 
and has shown considerable experimental strength/2°l. Each 
LSTM is acollection of cells, or device modules, which collect 
and preserve the data streams. The cells are like a transport 
line (the upper line in each cell) that links data from the past 
and gathers them for the current from one module to 
another. Owing to the utilization of certain gates in each cell, 
data may be discarded of, filtrated, or incorporated for the 
following cells in each cell. Thus, the gates that are focused 
ona sigmoidal neural network layer allow the cells to allow 
data to transit through or be disposed of optionally. Each 
sigmoid layer generates numbers in the vicinity of zero and 
one, reflecting - cell's quantity of each data segment should 
be allowed through. Most specifically, an estimate of zero 
value means that "let nothing pass through;" and whilst; an 
estimate of one suggests that "let anything pass through." 
Each LSTM includes three styles of gates mostly with 
purpose of regulating the condition of each cell.[24): 
> Forget Gate outputs a number between 0 and 1, where 1 
shows “completely keep this”; whereas, 0 implies 
“completely ignore this.” 
> Memory Gate determines which new data to be retained 
in the cell. Initially, a sigmoid layer, called the "door 
input layer," decides which values are to be altered. A 
tanh layer then allows a vector of fresh agent values 
which can be attached to the system. 
> Output Gate determines what yields every cell can yield. 
The rendered value, together with the processed and 
freshly inserted data, will be based on cell state. 


This architecture is prone to numerous modifications, 

however, it has nearly as many changes as the articles which 

use it. A condensed version of LSTM termed GRU (Gated 

Recurrent Unit) was implemented in 2014122]. This has the 

benefit of having fewer computationally complex, since it has 

smaller parameters and equations. It remains however as 

powerful as its predecessor. Here are the equations which 

govern the model's dynamism: 

> Zt=o(WZxt + UZht-1 + bZ) (update gate) 

> Rt=o0(WRxt + URht-1 + bR) (reset gate) 

> ht=Zte ht-1+ (1 - Zt)  tanh(Whxt + Uh(Rt ° ht-1) + 
bh). 

Last year's curiosity in utilizing LSTM has unexpectedly 


increased and its predictive accuracy can illustrate its 
strength in nearly every area of science. It took about two 
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decades for the strategy to become mainstream. To construct 
an LSTM model, initially we have to convert the Dataset 
training and test into a three-dimensional collection of 
"samples, features, and time measures”. We utilize a layer of 
inputs, a hidden layer of LSTM blocks as well as a single layer 
of output. The accompanying principles are employed to 
evaluate the amount of layers and the number of neurons in 
each layer: 
> Input layer: Logically, we possess one entry; the number 
of neurons in that layer is calculated by the number of 
columns that can be quantified. 
> Output layer: each neural network has a single output. 
> Hidden layer: the size of this layer, that is to say the 
number of neurons, is to determine. 


GAN: Generative adversarial networks, or GANs for shorter, 
are a strategy to generative modeling incorporating 
techniques of deep learning, like neural convolution 
networks. GANs are an innovative means to teach a 
generative model by presenting the issue using two sub- 
models as a supervised learning problem: the generator 
model, and the discriminator model. There is teaching of the 
two models together. 


Naive Bayes is an illustration of a more commonly used 
generative model as a discriminative model. Naive Bayes 
functions by summing up the distribution of probability for 
each input variable and the output class. When a prediction 
is made, the likelihood for each potential outcome for each 
variable is determined, the individual probabilities are 
merged, and the highly realistic consequence is estimated. 


Other examples of generative models include Latent 
Dirichlet Allocation, or LDA, and the Gaussian Mixture Model, 
or GMM. 


RBM: A Restricted Boltzmann Machine is a generative 
stochastic artificial neural network which is able to learn a 
distribution of probability over its set of inputs. RBMs were 
first discovered in 1986 by Paul Smolensky under the name 
Harmonium. RBMs are a variety of Boltzmann devices, as 
their name suggests, with the constraint that its neurons 
should build a bipartisan graph. This limitation enables for 
increasingly effective training algorithms than those 
accessible for the general group of Boltzmann machines, in 
specific the contrastive divergence algorithm based on 
gradients. 


DBN: Deep Belief Network is an unsupervised probabilistic 
deep learning algorithm, made up of multilayered latent 
stochastic variables. DBN is a graphical model of the 
generative hybrid. Top two stratification are undirected. 
Two layers lower have guided interactions from above 
layers. DBN are pre trained using Greedy Algorithm. By 
adequately instantiating the weights of all the layers, pre- 
training improves optimization. DBNs are used in image 
recognition, video sequences, motion capture data and 
speech recognition. 


FORECAST PERFORMANCE METRICS (Common Accuracy 
Statistics) 

RMSE: The RMSE is the square root of residual variance. This 
shows the model’s absolute fit for the data - how identical 
the measured data points are to the expected values of the 
model. RMSE is an absolute fit-measure. It can be assumed, 
focusing on a thumb rule, that the RMSE values between 0.2 
and 0.5 indicate that the model was able to predict the 
relevant data reasonably. Adjusted R-squared upwards of 
0.75 is, however, a very better value for the accuracy display. 


For certain cases it is also appropriate to have Modified R- 
squared of 0.4 or more. 


MSE: The mean squared error, or MSE, is measured as the 
square predicted error values average. The squaring of 
forecast error values causes them to be positive; it also puts 
greater weight on major errors. A greater MSE implies that 
the data values are massively scattered around its central 
moment (mean), and a lesser MSE indicates otherwise and is 
certainly the preferable and/or desired option because it 
indicates that the data values are distributed near to its 
central moment, which is normally large. 


MAPE: The mean absolute percentage error (MAPE) is the 
mean or average of forecast errors with an absolute 
percentage. Error is characterized as real or detected, minus 
the expected value. Percentage errors are summed up 
lacking regard to MAPE calculation signing. This calculation 
is easy to grasp, because it renders the mistake in percentage 
terms. Furthermore, since absolute percentage errors are 
considered, the issue of equally dismissing positive and 
negative errors is prevented. MAPE therefore has a strategic 
reach and is a metric which is widely included in forecasting. 
The smaller the MAPE the better the forecast. 


RELATED WORKS: 

A comparison analysis was carried out between LSTM and 
ARIMA in AjilaElmasdotter's forecasting sale retaill?I. 
Furthermore, ARIMA time series is a commonly used 
methodology for the budgetary time series?3 in 
econometrics, numerous ARIMA models have been 
formulated to evaluate and project stock markets [24-26], In 
addition, ARIMA is used for water expenditure / forecasting 
of consumption [2728] and the requirement for electricity!291. 
Most notably, there has been a great deal of concern in the 
usage of deep learning models [3931], particularly recursive 
models like LSTM Neural Network for the estimation of 
budgetary time series, particularly in the stock market! 
34] In[32] presented a simulation and simulation of China stock 
returns employing LSTM architecture via an accepted 27.2 
percent accuracy, in!33] studied the usefulness of recurrent 
neural networks to the assessment of stock market price 
fluctuations. Shanghai Composite Index and Dow Jones Index 
were eventually proposed [34]. With regard to budget analysis 
and forecasting, hardly any research has been reported 
trying to compare or even implementing the two techniques, 
the majority of which have been adapted to the stock price 
as discussed above. It is due largely to the complexity of 
accessing specific data sets and the unreliable nature of such 
data. None of its authors correlated LSTM and ARIMA 
models to efficiency. 


MAJOR DIFFERENCES BETWEEN ARIMA AND LSTM: 
Even though the work is new, it is obvious that LSTM 
architectures provide tremendous prospects as contenders 
for modeling and forecasting time series. We review the 
major variations between an LSTM and ARIMA in the 
following Table. The usage of RNNs, especially the LSTM 
architecture, allows us to set many parameters that we need 
to change in order to achieve optimum efficiency in the 
forecasting tasks. The problem lies in selecting the 
appropriate parameters to identify the correct architecture 
for the platform. An ARIMA model is easy to customize as it 
provides a reasonable efficiency, such model also includes 
the determination of the parameters p, d and q such that pis 
the order of the autoregressive part (AR), the order of 
differentiation and q the order of the average moving 
portion (MA). 
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> ARIMA is a linear model 


> Small amount of data is also sufficient for ARIMA (i.e., 
minimum number of observations is 50-100). 

> ARIMA is a parametric model, which is to say for each 
series we have to define the parameters p, d and q. 

> ARIMA is dedicated only for time series analysis 


> LSTM is a nonlinear model. 

> LSTM can store and process large amount of data. 

> As LSTM is a non-parametric model, requires 
adjustment of some hypermeters. 

> LSTM models can process sequential data as well. 





Evaluation measures of real world time series data: 
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Comparison of Software: 


ARIMA LSTM 














> EViews > Python 
> Julia. > Rnn 
> Mathematicia | > Tensor Flow 
> MATLAB > IBM SPSS 
> NCSS > MATLAB 
> Python > SAS 
>R > Julia 
> Ruby > Ruby 
> SAS 
> IBM SPSS 
> Stata 

CONCLUSIONS: 


These methodologies are acquiring prominence amongst 
investigators across diverse disciplines with the latest 
enhancement in the development of advanced machine 
learning-based approaches and especially deep learning 
algorithms. The major concern then is how specifically and 
effectively these newly developed techniques compete with 
conventional methods. The outcomes of the overall 
discrepancy in error indicate that in regards of both RMSE 
and MAE, the LSTM-model tended to have greater predictive 
accuracy than the ARIMA-model. All in all, we have shown 
that on lengthy-term predictions, neural networks provide 
more precision. Since the error for the ARIMA model 
presented by Box Jenkins is not significantly higher, we note 
that its use remains an advantage in the prediction of time 
series. We also saw the two models enjoy complimentary 
differences: ARIMA is far pickier on seasonal peaks while 
LSTM suits the balanced portion of the series better. For 
some developed countries dysfunctional economic system 
makes reliable prediction models complicated. The 
probability of having a better estimate (without sacrificing 
the complexity of the data through diff and linearization) 
emerges with the aid of new approaches in RNN. 
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